AITopics | geodesic multi-modal mixup

Collaborating Authors

geodesic multi-modal mixup

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Geodesic Multi-Modal Mixup for Robust Fine-Tuning

Neural Information Processing SystemsDec-26-2025, 11:52:59 GMT

Pre-trained multi-modal models, such as CLIP, provide transferable embeddings and show promising results in diverse applications. However, the analysis of learned multi-modal embeddings is relatively unexplored, and the embedding transferability can be improved. In this work, we observe that CLIP holds separated embedding subspaces for two different modalities, and then we investigate it through the lens of \textit{uniformity-alignment} to measure the quality of learned representation. Both theoretically and empirically, we show that CLIP retains poor uniformity and alignment even after fine-tuning. Such a lack of alignment and uniformity might restrict the transferability and robustness of embeddings. To this end, we devise a new fine-tuning method for robust representation equipping better alignment and uniformity. First, we propose a \textit{Geodesic Multi-Modal Mixup} that mixes the embeddings of image and text to generate hard negative samples on the hypersphere.

geodesic multi-modal mixup, name change, robust fine-tuning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.36)

Add feedback

Geodesic Multi-Modal Mixup for Robust Fine-Tuning

Neural Information Processing SystemsJan-19-2025, 17:44:37 GMT

geodesic multi-modal mixup, representation, robust fine-tuning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback